Skip to content

fix: clean up failed conversation data to avoid conflict on retry#2984

Open
yhjun1026 wants to merge 1 commit intomainfrom
fix/agent-conversation-conflict
Open

fix: clean up failed conversation data to avoid conflict on retry#2984
yhjun1026 wants to merge 1 commit intomainfrom
fix/agent-conversation-conflict

Conversation

@yhjun1026
Copy link
Collaborator

Summary

  • Fix data conflict error when restarting agent conversation after an unexpected failure
  • Add _cleanup_failed_conversation method to remove failed conversation records and associated messages/plans
  • Detect RUNNING or FAILED state in last conversation before creating a new one
  • Clean up residual data before generating a new conversation ID

Problem

When an agent conversation fails unexpectedly (leaving state as RUNNING or FAILED), subsequent attempts to start a new conversation would cause a database conflict error due to duplicate conv_id. The conversation ID generation was based on counting existing records, which didn't account for failed conversations.

Solution

  1. Added _cleanup_failed_conversation method that:

    • Deletes messages associated with the failed conversation
    • Deletes plans associated with the failed conversation
    • Removes the conversation record itself
  2. Modified conversation ID generation logic to:

    • Check if the last conversation is in RUNNING or FAILED state
    • Clean up the failed conversation data before creating a new one
    • Regenerate the conversation list after cleanup

Test plan

  • Test starting a new conversation after a failed one
  • Verify no duplicate conv_id errors occur
  • Verify failed conversation data is properly cleaned up

🤖 Generated with Claude Code

When an agent conversation fails unexpectedly (leaving state as RUNNING
or FAILED), subsequent attempts to start a new conversation would cause
a database conflict error due to duplicate conv_id.

This fix:
- Adds _cleanup_failed_conversation method to remove failed conversation
  records and associated messages/plans
- Detects RUNNING or FAILED state in last conversation before creating
  a new one
- Cleans up residual data before generating a new conversation ID

Co-Authored-By: Claude (GLM-5) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 2, 2026 12:56
@github-actions github-actions bot added the fix Bug fixes label Mar 2, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a database conflict error that occurred when retrying an agent conversation after an unexpected failure. When a conversation ends in RUNNING or FAILED state, subsequent new conversations would fail with a duplicate conv_id error because the ID generation was count-based and didn't account for the leftover failed records.

Changes:

  • Added _cleanup_failed_conversation method to MultiAgents that deletes messages, plans, and the conversation record for a failed conversation
  • Modified the agent_chat_v2 method to detect RUNNING/FAILED states in the last conversation and clean up before creating a new one
  • Added an agent_conv_id = None initialization and moved the conversation ID generation logic to after the cleanup block

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +172 to +191
try:
# Delete messages associated with this conversation
self.gpts_messages_dao.delete_chat_message(conv_id)
logger.info(f"Deleted messages for failed conversation: {conv_id}")
except Exception as e:
logger.warning(f"Failed to delete messages for {conv_id}: {e}")

try:
# Delete plans associated with this conversation
self.memory.plans_memory.remove_by_conv_id(conv_id)
logger.info(f"Deleted plans for failed conversation: {conv_id}")
except Exception as e:
logger.warning(f"Failed to delete plans for {conv_id}: {e}")

try:
# Delete the conversation record
self.gpts_conversations.delete_chat_message(conv_id)
logger.info(f"Deleted failed conversation record: {conv_id}")
except Exception as e:
logger.warning(f"Failed to delete conversation record {conv_id}: {e}")
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gpts_conversations.delete_chat_message(conv_id) call (and similarly gpts_messages_dao.delete_chat_message(conv_id)) uses a LIKE '%conv_id%' pattern inside the DAO, where conv_id is the failed conversation's ID in the format {base_conv_id}_{N} (e.g., "abc123_1").

This LIKE pattern (%abc123_1%) will match not only the target conversation abc123_1, but also any record whose conv_id contains abc123_1 as a substring — for example abc123_10, abc123_11, abc123_19, etc. If the user has had 10+ conversation turns under the same base_conv_id, those higher-numbered conversations (and their associated messages) will be silently deleted when only the failed _1 record should be cleaned up.

To fix this, instead of relying on the LIKE-based delete_chat_message method, a new or existing exact-match deletion method should be used, or the cleanup should use an equality filter (conv_id == failed_conv_id) rather than a substring match.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +189
try:
# Delete the conversation record
self.gpts_conversations.delete_chat_message(conv_id)
logger.info(f"Deleted failed conversation record: {conv_id}")
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method called to delete the conversation record is self.gpts_conversations.delete_chat_message(conv_id). This is the GptsConversationsDao.delete_chat_message() method, whose name strongly implies it deletes messages — but it actually deletes entries from the gpts_conversations table. The comment on line 187 correctly says "Delete the conversation record", but the method name creates confusion for readers and could lead to maintainability issues. A more appropriately named method (e.g., delete_by_conv_id) should be used or added to the DAO.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants